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Abstract — In applications such as social, energy, transporta- 
tion, sensor, and neuronal networks, high-dimensional data nat- 
urally reside on the vertices of weighted graphs. The emerging 
field of signal processing on graphs merges algebraic and spectral 
graph theoretic concepts with computational harmonic analysis 
to process such signals on graphs. In this tutorial overview, we 
outline the main challenges of the area, discuss different ways 
to define graph spectral domains, which are the analogues to 
the classical frequency domain, and highlight the importance of 
incorporating the irregular structures of graph data domains 
when processing signals on graphs. We then review methods to 
generalize fundamental operations such as filtering, translation, 
modulation, dilation, and downsampling to the graph setting, 
and survey the localized, multiscale transforms that have been 
proposed to efficiently extract information from high-dimensional 
data on graphs. We conclude with a brief discussion of open issues 
and possible extensions. 



I. Introduction 

Graphs are generic data representation forms which are 
useful for describing the geometric structures of data domains 
in numerous applications, including social, energy, transporta- 
tion, sensor, and neuronal networks. The weight associated 
with each edge in the graph often represents the similarity 
between the two vertices it connects. The connectivities and 
edge weights are either dictated by the physics of the problem 
at hand or inferred from the data. For instance, the edge 
weight may be inversely proportional to the physical distance 
between nodes in the network. The data on these graphs can be 
visualized as a finite collection of samples, with one sample 
at each vertex in the graph. Collectively, we refer to these 
samples as a graph signal. An example of a graph signal is 
shown in Figure [T] 



Fig. 1. A random positive graph signal on the vertices of the Petersen graph. 



We find examples of graph signals in many different en- 
gineering and science fields. In transportation networks, we 
may be interested in analyzing epidemiological data describing 
the spread of disease, census data describing human migration 
patterns, or logistics data describing inventories of trade goods 
(e.g. gasoline or grain stocks). In brain imaging, it is now 
possible to non-invasively infer the anatomical connectivity 
of distinct functional regions of the cerebral cortex (TJ, and 
this connectivity can be represented by a weighted graph 
with the vertices corresponding to the functional regions of 
interest. Thus, noisy fMRI images can be viewed as signals 
on weighted graphs. Weighted graphs are also commonly 
used to represent similarities between data points in statistical 
learning problems for applications such as machine vision 
|2] and automatic text classification [3]. In fact, much of 
the literature on graph-based data analysis techniques em- 
anates from the statistical learning community, as graph-based 
methods became especially popular for the semi-supervised 
learning problem where the objective is to classify unknown 
data with the help of a few labelled samples (e.g., Q-||9)). In 
image processing, there has been a recent spike in graph-based 
filtering methods that build non-local and semi-local graphs 
to connect the pixels of the image based not only on their 
physical proximity, but also on noisy versions of the image 
to be processed (e.g., pQ}-|T2| and references therein). Such 
methods are often able to better recognize and account for 
image edges and textures. 

Depending on the application objectives, one might be 
interested in filtering, denoising, inpainting, or compressing 
graph signals. How can data be processed on irregular data 
domains such as arbitrary graphs? What are the best ways to 
efficiently extract information, either statistically or visually, 
from this high-dimensional data, for the purposes of storage, 
communication, and analysis? Is it possible to use operators 
or algorithms from the classical digital signal processing 
toolboxes? 

A. The Main Challenges of Signal Processing on Graphs 

The ability of wavelet, time-frequency, curvelet and other 
localized transforms to sparsely represent different classes of 
high-dimensional data such as audio signals and images that lie 
on regular Euclidean spaces has led to a number of resounding 
successes in the aforementioned signal processing tasks (see, 
e.g., (13] Section II] for a recent survey of transform methods). 
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Both a signal on a graph with N vertices and a classical 
discrete-time signal with TV samples can be viewed as vectors 
in R N . However, a major obstacle to the application of the 
classical signal processing techniques in the graph setting 
is that processing the graph signal in the same ways as a 
discrete-time signal ignores key dependencies arising from the 
irregular data domain Moreover, many extremely simple yet 
fundamental concepts that underlie classical signal processing 
techniques become significantly more challenging in the graph 
setting: 

• To translate an analog signal f(t) to the right by 3, we 
simply perform a change of variable and consider f(t—3). 
However, it is not immediately clear what it means to 
translate a graph signal "to the right by 3." The change 
of variable technique will not work as there is no meaning 
to /(o — 3) in the graph setting. One naive option would 
be to simply label the vertices from 1 to N and define 
f(o — 3) := /(mod(o — 3, TV)), but it is not particularly 
useful to define a generalized translation that depends 
heavily on the order in which we (arbitrarily) label the 
vertices. The unavoidable fact is that weighted graphs 
are irregular structures that lack a shift-invariant notion 
of translation^] 

• Modulating a signal on the real line by multiplying 
by a complex exponential corresponds to translation in 
the Fourier domain. However, the analogous spectrum 
in the graph setting is discrete and irregularly spaced, 
and it is therefore non-trivial to define an operator that 
corresponds to translation in the graph spectral domain. 

• We intuitively downsample a discrete-time signal by 
deleting every other data point, for example. Yet, what 
does it mean to downsample the signal on the vertices 
of the graph shown in Figure [T]? There is not an obvious 
notion of "every other vertex" of a weighted graph. 

• Even when we do fix a notion of downsampling, in order 
to create a multiresolution on graphs, we need a method 
to generate a coarser version of the graph that somehow 
captures the structural properties embedded in the original 
graph. 

In addition to dealing with the irregularity of the data 
domain, the graphs in the previously mentioned applications 
can feature a large number of vertices, and therefore many 
data samples. In order to scale well with the size of the data, 
signal processing techniques for graph signals should employ 
localized operations that compute information about the data 
at each vertex by using data from a small neighborhood of 
vertices close to it in the graph. 

To summarize, the overarching challenges of processing 
signals on graphs are 1) in cases where the graph is not 
directly dictated to us by the application, deciding how to 
construct a weighted graph that captures the geometric struc- 
ture of the underlying data domain; 2) incorporating the graph 

throughout, we refer to signal processing concepts for analog or discrete- 
time signals as "classical," in order to differentiate them from concepts defined 
in the graph signal framework. 

2 The exception is the class of highly regular graphs such as a ring graph 
that have circulant graph Laplacians. Grady and Polimeni 1 14 p. 158] refer to 
such graphs as shift invariant graphs. 



structure into localized transform methods; 3) at the same 
time, leveraging invaluable intuitions developed from years 
of signal processing research on Euclidean domains; and 4) 
developing computationally efficient implementations of the 
localized transforms, in order to extract information from high- 
dimensional data on graphs and other irregular data domains. 

To address these challenges, the emerging field of signal 
processing on graphs merges algebraic and spectral graph the- 
oretic concepts with computational harmonic analysis. There 
is an extensive literature in both algebraic graph theory (e.g., 
(T5) ) and spectral graph theory (e.g., (T6) , (17) and references 
therein); however, the bulk of the research prior to the past 
decade focused on analyzing the underlying graphs, as op- 
posed to signals on graphs. 

Finally, we should note that researchers have also designed 
localized signal processing techniques for other irregular data 
domains such as polygonal meshes and manifolds. This work 
includes, for example, low-pass filtering as a smoothing oper- 
ation to enhance the overall shape of an object jT8), transform 
coding based on spectral decompositions for the compression 
of geometry data (T9) , and multiresolution representations of 
large meshes by decomposing one surface into multiple levels 
with different details (20). There is no doubt that such work 
has inspired and will continue to inspire new signal processing 
techniques in the graph setting. 



B. Outline of the Paper 

The objective of this paper is to offer a tutorial overview 
of the analysis of data on graphs from a signal processing 
perspective. In the next section, we discuss different ways to 
encode the graph structure and define graph spectral domains, 
which are the analogues to the classical frequency domain. 
Section [ill] surveys some generalized operators on signals on 
graphs, such as filtering, translation, modulation, and down- 
sampling. These operators form the basis for a number of 
localized, multiscale transform methods, which we review in 
Section |IV| We conclude with a brief mention of some open 
issues and possible extensions in Section [V] 

II. The Graph Spectral Domains 

Spectral graph theory has historically focused on construct- 
ing, analyzing, and manipulating graphs, as opposed to signals 
on graphs. It has proved particularly useful for the construction 
of expander graphs (21) , graph visualization (T7] Section 
16.7], spectral clustering (22) , graph coloring [nj Section 
16.9], and numerous other applications in chemistry, physics, 
and computer science (see, e.g., (23) for a recent review). 

In the area of signal processing on graphs, spectral graph 
theory has been leveraged as a tool to define frequency 
spectra and expansion bases for graph Fourier transforms. In 
this section, we review some basic definitions and notations 
from spectral graph theory, with a focus on how it enables 
us to extend many of the important mathematical ideas and 
intuitions from classical Fourier analysis to the graph setting. 
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A. Weighted Graphs and Graph Signals 

We are interested in analyzing signals defined on an undi- 
rected, connected, weighted graph Q = {V,£,W}, which 
consists of a finite set of vertices V with |V| = N, a set 
of edges £, and a weighted adjacency matrix W. If there is 
an edge e = (i, j) connecting vertices i and j, the entry Wij 
represents the weight of the edge; otherwise, Wij = 0. If a 
graph Q is not connected and has M connected components 
(M > 1), we can separate signals on Q into M pieces 
corresponding to the M connected components, and process 
the separated signals on each of the subgraphs. 

When the edge weights are not naturally defined by an 
application, one common way to define the weight of an edge 
connecting vertices i and j is via a thresholded Gaussian kernel 
weighting function: 

WiJ = H(-^) (1) 

[ otherwise 

for some parameters a and k. In ([I]), dist(i,j) may represent 
a physical distance between vertices i and j, or the Euclidean 
distance between two feature vectors describing i and j, 
the latter of which is especially common in graph-based 
semi- supervised learning methods. A second common method 
is to connect each vertex to its /c-nearest neighbors based 
on the physical or feature space distances. For other graph 
construction methods, see, e.g., (14| Chapter 4]. 

A signal or function / : V —> R defined on the vertices of 
the graph may be represented as a vector f G R. N , where the 
jth com p 0nen t of the vector f represents the function value at 
the i th vertex in vj^The graph signal in Figure [I] is one such 
example. 

B. The Non-Normalized Graph Laplacian 

The non-normalized graph Laplacian, also called the com- 
binatorial graph Laplacian, is defined as L := D — W, where 
the degree matrix D is a diagonal matrix whose i th diagonal 
element di is equal to the sum of the weights of all the 
edges incident to vertex i. The graph Laplacian is a difference 
operator, as, for any signal f G R N , it satisfies 

where the neighborhood M% is the set of vertices connected to 
vertex i by an edge. More generally, we denote by AF(i, k) the 
set of vertices connected to vertex i by a path of k or fewer 
edges. 

Because the graph Laplacian L is a real symmetric matrix, 
it has a complete set of orthonormal eigenvectors, which we 
denote by {ui} l=0 1 iV _ 1 |^] These eigenvectors have associ- 
ated real, non-negative eigenvalues {\i} L = ± N _ 1 satisfying 

3 In order to analyze data residing on the edges of an unweighted graph, 
one option is to build its line graph, where we associate a vertex to each 
edge and connect two vertices in the line graph if their corresponding edges 
in the original graph share a common vertex, and then analyze the data on 
the vertices of the line graph. 

4 Note that there is not necessarily a unique set of graph Laplacian 
eigenvectors, but we assume throughout that a set of eigenvectors is chosen 
and fixed. 



Luj = XiUi, for I = 0, 1, . . . , N — 1. Zero appears as an 
eigenvalue with multiplicity equal to the number of connected 
components of the graph (16), and thus, since we consider 
connected graphs, we assume the graph Laplacian eigenvalues 
are ordered as = Ao < Ai < A2... < Xn-i '■= A max . We 
denote the entire spectrum by cr(L) := {Ao, Ai, . . . , Ajy-i}- 

C. A Graph Fourier Transform and Notion of Frequency 
The classical Fourier transform 

f(0 ■= </,e 2 ^*> = J f(t)e- 2 ^dt 

is the expansion of a function / in terms of the complex expo- 
nentials, which are the eigenf unctions of the one-dimensional 
Laplace operator: 

_A(e 27r ^) = -T^e 2 "^ = (27r0 2 e 27r ^. (2) 

Analogously, we can define the graph Fourier transform f of 
any function f G on the vertices of Q as the expansion of 
f in terms of the eigenvectors of the graph Laplacian: 

N 

f(\ l ):=(f,u l )=J2f(i)ut(i). (3) 

i=l 

The inverse graph Fourier transform is then given by 

N-l 

m = E hh)ui{i). (4) 

1=0 

In classical Fourier analysis, the eigenvalues {(27t^) 2 }^ g m 
in ([2]) carry a specific notion of frequency: for £ close 
to zero (low frequencies), the associated complex exponen- 
tial eigenfunctions are smooth, slowly oscillating functions, 
whereas for £ far from zero (high frequencies), the associ- 
ated complex exponential eigenfunctions oscillate much more 
rapidly. In the graph setting, the graph Laplacian eigenvalues 
and eigenvectors provide a similar notion of frequency. For 
connected graphs, the Laplacian eigenvector uo associated 
with the eigenvalue is constant and equal to ^= at each 
vertex. The graph Laplacian eigenvectors associated with low 
frequencies Aj vary slowly across the graph; i.e., if two 
vertices are connected by an edge with a large weight, the 
values of the eigenvector at those locations are likely to be 
similar. The eigenvectors associated with larger eigenvalues 
oscillate more rapidly and are more likely to have dissimilar 
values on vertices connected by an edge with high weight. 
This is demonstrated in both Figure [2| which shows different 
graph Laplacian eigenvectors for a random sensor network 
graph, and Figure [5] which shows the number \Zg(-)\ of zero 
crossings of each graph Laplacian eigenvector. The set of zero 
crossings of a signal f on a graph Q is defined as 

Zg(f) := {e=(i,j)e£:f(i)f(j)<0}; 

that is, the set of edges connecting a vertex with a positive 
signal to a vertex with a negative signal. 
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Fig. 2. Three graph Laplacian eigenvectors of a random sensor network 
graph. The signals' component values are represented by the blue (positive) 
and black (negative) bars coming out of the vertices. Note that U50 contains 
many more zero crossings than the constant eigenvector uo and the smooth 
Fiedler vector w\. 




(a) 



K 

(b) 



Fig. 3. The number of zero crossings, |Zg(uj)| in (a) and |Zg(uj)| in 
(b), of the non-normalized and normalized graph Laplacian eigenvectors, 
respectively. In both cases, the Laplacian eigenvectors associated with larger 
eigenvalues cross zero more often, confirming the interpretation of the graph 
Laplacian eigenvalues as notions of frequency. 



D. Graph Signal Representations in Two Domains 

The graph Fourier transform ([5J and its inverse ^ give 
us a way to equivalently represent a signal in two different 
domains: the vertex domain and the graph spectral domain. 
While we often start with a signal g in the vertex domain, 
it may also be useful to define a signal g directly in the 
graph spectral domain. We refer to such signals as kernels. 
In Figures |4ja) and [4jb)> one such signal, a heat kernel, is 
shown in both domains. Analogously to the classical analog 
case, the graph Fourier coefficients of a smooth signal such 
as the one shown in Figure [4] decay rapidly. Such signals are 
compressible as they can be closely approximated by just a 
few graph Fourier coefficients (see, e.g., (24)— (26) for ways 
to exploit this compressibility). 





Fig. 4. Equivalent representations of a graph signal in the vertex and graph 
spectral domains, (a) A signal g that resides on the vertices of the Minnesota 
road graph [27] with Gaussian edge weights as in {T}. The signal's component 
values are represented by the blue (positive) and black (negative) bars coming 
out of the vertices, (b) The same signal in the graph spectral domain. In this 
case, the signal is a heat kernel which is actually defined directly in the 
graph spectral domain by g(Xi) = e~ 5Xl . The signal plotted in (a) is then 
determined by taking an inverse graph Fourier transform {4} of g. 



E. Discrete Calculus and Signal Smoothness with Respect to 
the Intrinsic Structure of the Graph 

When we analyze signals, it is important to emphasize that 
properties such as smoothness are with respect to the intrinsic 
structure of the data domain, which in our context is the 
weighted graph. Whereas differential geometry provides tools 
to incorporate the geometric structure of the underlying man- 
ifold into the analysis of continuous signals on differentiable 
manifolds, discrete calculus provides a "set of definitions 
and differential operators that make it possible to operate the 
machinery of multivariate calculus on a finite, discrete space 

G3 p. 

To add mathematical precision to the notion of smoothness 
with respect to the intrinsic structure of the underlying graph, 
we briefly present some of the discrete differential operators 
defined in (4), (6j-(8), (IS), (28)-(30)Q The edge derivative 
of a signal f with respect to edge e = at vertex i is 

defined as 

¥e :=\/W^[/(i) -/(*)], 
and the graph gradient of f at vertex i is the vector 

Vif : 



I" ) 
l 9e ihee s.t. 



e=(i,j) for some jev_ 
Then the local variation at vertex i 



|V,f|| 2 := 

-,es s.t. e=(ij) for some jev 



df 

de 



£^[/tt)-/(0] s 

provides a measure of local smoothness of f around vertex i, 
as it small when the function f has similar values at i and all 
neighboring vertices of i. 

For notions of global smoothness, the discrete p-Dirichlet 
form of f is defined as 



iev 



iev 



(5) 



When p = 1, Si(f) is the total variation of the signal with 
respect to the graph. When p — 2, we have 

iev jeAfi 

= E W i9j \fU)-f(i)] 2 =FU. (6) 

5 Note that the names of many of the discrete calculus operators correspond 
to the analogous operators in the continuous setting. In some problems, the 
weighted graph arises from a discrete sampling of a smooth manifold. In that 
situation, the discrete differential operators may converge - possibly under 
additional assumptions - to their namesake continuous operators as the density 
of the sampling increases. For example, (3T| - (34) examine the convergence 
of discrete graph Laplacians (normalized and non-normalized) to continuous 
manifold Laplacians. 
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62(f) is known as the graph Laplacian quadratic form (XT), 
and the semi-norm ||f ||^ is defined as 



||f|| L := ||L 5 f|| 2 = v / FLf= V / ^(f). 

Note from ^ that the quadratic form 62(f) is equal to zero 
if and only if f is constant across all vertices (which is why 
|| f ||^ is only a semi-norm), and, more generally, 62(f) is small 
when the signal f has similar values at neighboring vertices 
connected by an edge with a large weight; i.e., when it is 
smooth. 

Returning to the graph Laplacian eigenvalues and eigen- 
vectors, the Courant-Fischer Theorem (35j Theorem 4.2.11] 
tells us they can also be defined iteratively via the Rayleigh 
quotient as 

A = min {FLf } , (7) 

f£M N 
l|f||2 = l 

andA;= min {FLf} , I = 1, 2, . . . , N - 1, (8) 

feR N 

l|f||2 = l 

f_Lspan{uo,...,uz_i} 

where the eigenvector ui is the minimizer of the I th prob- 
lem. From ([6]) and ([7]), we see again why u is constant 
for connected graphs. Equation ^ explains why the graph 
Laplacian eigenvectors associated with lower eigenvalues are 
smoother, and provides another interpretation for why the 
graph Laplacian spectrum carries a notion of frequency. 

Example [T] in the box below demonstrates the importance of 
incorporating the underlying graph structure when processing 
signals on graphs. 



1, 1 



Example 1 (Importance of the underlying graph): 

In the figure above, we plot the same signal f on 
three different unweighted graphs with the same set 
of vertices, but different edges. The top row shows the 
signal in the vertex domains, and the bottom row shows 
the signal in the respective graph spectral domains. 

The smoothness and graph spectral content of the 
signal both depend on the underlying graph structure. 
In particular, the signal f is smoothest with respect 
to the intrinsic structure of Q\, and least smooth with 
respect to the intrinsic structure ofQs. This can be seen 
( i) visually; ( ii) through the Laplacian quadratic form, 
as FLif = 0.14, FLsf = 1.31, and FL 3 f = 1.81; and 
(Hi) through the graph spectral representations, where 
the signal has all of its energy in the low frequencies 
in the graph spectral plot ofi on Q\, and more energy 
in the higher frequencies in the graph spectral plot of 
f on Q 3 . 



F. Other Graph Matrices 

The basis {u^}/ = o,i,...,at-i of graph Laplacian eigenvectors 
is just one possible basis to use in the forward and inverse 
graph Fourier transforms ^ and ([?]). A second popular option 
is to normalize each weight Wij by a factor ^j=j=- Doing so 

leads to the normalized graph Laplacian, which is defined as 
L := D~2LD~2 ? or, equivalently, 

The eigenvalues {Az}z=o,i,...,iv-i of the normalized Laplacian 
of a connected graph Q satisfy 

= Ao < Ai < . . . < A max < 2, 

with A max = 2 if and only if Q is bipartite; i.e., the set of 
vertices V can be partitioned into two subsets Vi and V2 such 
that every edge e G £ connects one vertex in Vi and one vertex 
in V2 . We denote the normalized graph Laplacian eigenvectors 
by {u/}z=o,i,...,iV-i- As seen in Figure [^b), the spectrum of 
L also carries a notion of frequency, with the eigenvectors 
associated with higher eigenvalues generally having more zero 
crossings. However, unlike u , the normalized graph Laplacian 
eigenvector uo associated with the zero eigenvalue is not a 
constant vector. 



The normalized and non-normalized Laplacians are both 
examples of generalized graph Laplacians (36) Section 1.6], 
also called discrete Schrodinger operators. A generalized 
graph Laplacian of a graph Q is any symmetric matrix whose 
z, j th entry is negative if there is an edge connecting vertices 
i and j, equal to zero if i ^ j and i is not connected to j, and 
may be anything if i = j. 

A third popular matrix which is often used in 
dimensionality-reduction techniques for signals on graphs 
is the random walk matrix P := D _1 W. Each entry Pij 
describes the probability of going from vertex i to vertex j 
in one step of a Markov random walk on the graph Q. For 
connected, aperiodic graphs, each row of P* converges to the 
stationary distribution of the random walk as t goes to infinity. 
Closely related to the random walk matrix is the asymmetric 
graph Laplacian, which is defined as L a := I at — P, where 
In is the N x N identity matrix)^] Note that L a has the 
same set of eigenvalues as L, and if ui is an eigenvector of 
L associated with A/, then D~2u^ is an eigenvector of L a 
associated with the eigenvalue A/. 

As discussed in detail in the next section, both the normal- 
ized and non-normalized graph Laplacian eigenvectors can be 
used as filtering bases. There is not a clear answer as to when 
to use the normalized graph Laplacian eigenvectors, when 

6 L a is not a generalized graph Laplacian due to its asymmetry. 
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to use the non-normalized graph Laplacian eigenvectors, and 
when to use some other basis. The normalized Laplacian has 
the nice properties that its spectrum is always contained in 
the interval [0, 2] and, for bipartite graphs, the spectral folding 
phenomenon (37) can be exploited. However, the fact that 
the non-normalized graph Laplacian eigenvector associated 
with the zero eigenvalue is constant is a useful property in 
extending intuitions about DC components of signals from 
classical filtering theory. 

III. Generalized Operators for Signals on Graphs 

In this section, we review different ways to generalize 
fundamental operations such as filtering, translation, modu- 
lation, dilation, and downsampling to the graph setting. These 
generalized operators are the ingredients used to develop the 
localized, multiscale transforms described in Section [TV] 

A. Filtering 

The first generalized operation we tackle is filtering. We 
start by extending the notion of frequency filtering to the 
graph setting, and then discuss localized filtering in the vertex 
domain. 

1) Frequency Filtering: In classical signal processing, fre- 
quency filtering is the process of representing an input signal 
as a linear combination of complex exponentials, and amplify- 
ing or attenuating the contributions of some of the component 
complex exponentials: 



fout(0 = finiOHO, 



(9) 



where h(-) is the transfer function of the filter. Taking an 
inverse Fourier transform of ([9]), multiplication in the Fourier 
domain corresponds to convolution in the time domain: 



fout(t) = [ finiokzy^dt 

= [ fin(T)h(t-T)dT=:(f in *h)(t). 



(10) 
(ID 



Once we fix a graph spectral representation, and thus our 
notion of a graph Fourier transform (in this section, we use 
the eigenvectors of L, but L can also be used), we can directly 
generalize ^ to define frequency filtering, or graph spectral 
filtering, as 

fout(Xl) = fin(Xl)H^), (12) 

or, equivalently, taking an inverse graph Fourier transform, 



N-1 



fout(i) = fin(k)H^l) u l( i )' 



(13) 



i=o 



Borrowing notation from the theory of matrix functions (38J, 
we can also write fl2| ) and fl3] ) as i out = h(L)f in , where 



ft(L) := U 



ft(Ao) 



17. 



(14) 



ft(Ajv-i) 

The basic graph spectral filtering (Y2\ can be used to im- 
plement discrete versions of well-known continuous filtering 



techniques such as Gaussian smoothing, bilateral filtering, total 
variation filtering, anisotropic diffusion, and non-local means 
filtering (see, e.g., |39] and references therein). In particular, 
many of these filters arise as solutions to variational problems 
to regularize ill-posed inverse problems such as denoising, 
inpainting, and super-resolution. One example is the discrete 
regularization framework 



min { 
f L 



l|f-y|l! + 7S P (f)}, 



(15) 



where 5 p (f) is the p-Dirichlet form of ([5]). References j4j- 
(TTJ, (14) Chapter 5], and (28j-(30) discuss (15) and other 
energy minimization models in detail, as well as specific filters 
that arise as solutions, relations between these discrete graph 
spectral filters and filters arising out of continuous partial 
differential equations, and applications such as graph-based 
image processing, mesh smoothing, and statistical learning. In 
Example [2 we show one particular image denoising applica- 
tion of ([15) with p = 2. 

2 ) Filtering in the Vertex Domain: To filter a signal in the 
vertex domain, we simply write the output f ou t(i) at vertex i 
as a linear combination of the components of the input signal 
at vertices within a If -hop local neighborhood of vertex i: 

fout(i) =bi t if in (i)+ ^ hjfin(j), (18) 
jeN{i,K) 

for some constants {bij}ij e y. Equation ( fl"8] ) just says that 
filtering in the vertex domain is a localized linear transform. 

We now briefly relate filtering in the graph spectral domain 
(frequency filtering) to filtering in the vertex domain. When 
the frequency filter in fl2| ) is an order K polynomial h{\{) = 
Ylk=o a k^i f° r some constants {ak}k=o,i,...,K, we can also 
interpret the filtering equation ( fT2| ) in the vertex domain. From 
( [T3] ), we have 

N-1 

fouttt) = ^2 fin(h)kk)v>l(i) 



1=0 
N 



K 



N-1 



J2fin(j)J2 a k X l U *U)ui(i) 



N 



k=0 
K 



1=0 



(19) 



Yet, (L fc ) 



j=l k=0 

= when the shortest-path distance dg(i,j) 



between vertices i and j (i.e. the minimum number of edges 
comprising any path connecting i and j) is greater than k pT] 
Lemma 5.2]. Therefore, we can write ( [T9] ) exactly as in ( [T8] ), 
with the constants defined as 

K 

k=dg(i,j) 

So when the frequency filter is an order K polynomial, 
the frequency filtered signal at vertex i, f ou t(i), is a linear 
combination of the components of the input signal at vertices 
within a if -hop local neighborhood of vertex i. This property 
can be quite useful when relating the smoothness of a filtering 
kernel to the localization of filtered signals in the vertex 
domain. 
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Example 2 (Tikhonov regularization): We observe a noisy graph signal y = fo + rj, where rj is uncorrelated additive 
Gaussian noise, and wish to recover fo. To enforce a priori information that the clean signal fo is smooth with respect to 
the underlying graph, we include a regularization term of the form PLf , and, for a fixed 7 > 0, solve the optimization 
problem 



argmin { ||f - 
f 



y\\l 



The first-order optimality conditions of the convex objective function in ( [T6| ) show that (see, e.g., 
pU\ Proposition 1]) the optimal reconstruction is given by 



N-l 



/*« = E 



1 + jXi 
1=0 1 ' LJ 



(16) 

/[£]/, 429] Section III- A], 
(17) 



or, equivalently, f = h(L)y, where h(X) := can be viewed as a low-pass filter. 

As an example, in the figure below, we take the 512 x 512 cameraman image as fo and corrupt it with additive 
Gaussian noise with mean zero and standard deviation 0. 1 to get a noisy signal y. We then apply two different filtering 
methods to denoise the signal. In the first method, we apply a symmetric two-dimensional Gaussian lowpass filter of 
size 72 x 72 with two different standard deviations: 1.5 and 3.5. In the second method, we form a semi-local graph on 
the pixels by connecting each pixel to its horizontal, vertical, and diagonal neighbors, and setting the Gaussian weights 
([T]) between two neighboring pixels according to the similarity of the noisy image values at those two pixels; i.e., the 
edges of the semi-local graph are independent of the noisy image, but the distances in ([T]) are simply the differences 
between the neighboring pixel values in the noisy image. For the Gaussian weights in ([T]), we take a = 0.1 and n = 0. 
We then perform the low-pass graph filtering fl7] ) to reconstruct the image. This method is a variant of the graph-based 
anisotropic diffusion image smoothing method of / [77] /. 

In all image displays, we threshold the values to the [0,1] interval. The bottom row of images is comprised of 
zoomed-in versions of the top row of images. Comparing the results of the two filtering methods, we see that in order to 
smooth sufficiently in smoother areas of the image, the classical Gaussian filter also smooths across the image edges. 
The graph-filtering method does not smooth as much across the image edges, as the geometric structure of the image 
is encoded in the graph Laplacian via the noisy image. 



Gaussian-Filtered 



Gaussian-Filtered 



Original Image 



Noisy Image 







(Std. Dev. = 1.5) ( 










m 




m 








B. Convolution 



421: 



We cannot directly generalize the definition fTT] ) of a 
convolution product to the graph setting, because of the term 
h(t—r). However, one way to define a generalized convolution 
product for signals on graphs is to replace the complex 
exponentials in ( fTO] ) with the graph Laplacian eigenvectors 



N-l 

(f*h)(i):= E /( A 0MAiKW, (20) 

which enforces the property that convolution in the vertex 
domain is equivalent to multiplication in the graph spectral 
domain. 
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C. Translation 

The classical translation operator is defined through the 
change of variable (T u f)(t) := f(t — u), which, as dis- 
cussed earlier, we cannot directly generalize to the graph 
setting. However, we can also view the classical translation 
operator T u as a convolution with a delta centered at u; i.e., 
(T u f)(t) = (/ * S u )(t) in the weak sense. Thus, one way to 



— » 



pN 



IS 



define a generalized translation operator T n : 
via generalized convolution with a delta centered at vertex n 

ED 



N-l 



(T n f) (t) := ViV(/ * 5 n )(i) ^VN^2 (n)ui(i), 

(21) 



1=0 



where 



1 if z = n 
otherwise 



(22) 



A few remarks about the generalized translation ([21} are in 
order. First, we do not usually view it as translating a signal f 
defined in the vertex domain, but rather as a kernelized opera- 
tor acting on a kernel /(•) defined directly in the spectral graph 
domain. To translate this kernel to vertex n, the I th component 
of the kernel is multiplied by and then an inverse graph 

Fourier transform is applied. Second, the normalizing constant 
y/N in ( |2T] ) ensures that the translation operator preserves the 
mean of a signal; i.e., ^2^Li(T n g)(i) = J2iLi dip}- Third, the 
smoothness of the kernel g controls the localization of T n g 
around the center vertex n; that is, the magnitude (T n g)(i) 
of the translated kernel at vertex i decays as the distance 
between i and n increases (4TJ. This property can be seen in 
Figure [5] where we translate a heat kernel around to different 
locations of the Minnesota graph. Finally, unlike the classical 
translation operator, the generalized translation operator ^2A\ 
is not generally an isometric operator (||T n g||2 7^ HglhX due 
to the possible localization of the graph Laplacian eigenvectors 





(a) 



(b) 



(c) 



Fig. 5. The translated signals (a) Tioog, (b) T 2 oog, and (c) T 2 ooog, where 
g is the heat kernel shown in Figures |4ja) and|4jb). 



D. Modulation and Dilation 

In addition to translation, many classical transform methods 
rely on modulation or dilation to localize signals' frequency 
content. The classical modulation operator 



(M u /)(t) := e 2 ^f(t) 



(23) 



represents a translation in the Fourier domain: 

C?(0 = /(£-<"), Wei. 

One way to define generalized modulation in the graph setting 
is to replace the multiplication by a complex exponential (an 
eigenf unction of the ID Laplacian operator) in ( [23] ) with a 
multiplication by a graph Laplacian eigenvector: 

(24) 

The generalized modulation (|24[) is not exactly a translation 
in the graph spectral domain due to the discrete and irregular 
nature of the spectrum; however, as shown in [42, Figure 3], 
if a signal /(•) is localized around in the graph spectral 
domain, then M\S is localized around 

For s > 0, dilation or scaling of an analog signal / in the 
time domain is given by 



(M k f) (i) := VNu k (i)f(i 



(D s f)(t) := -J 



(25) 



We cannot directly generalize ( |25l > to the graph setting, be- 
cause I is not likely to be in the domain V for all i 6 V. 
Instead, we can take the Fourier transform of (|25l> 



(26) 



and generalize ([26]) to the graph setting. Assuming we start 
with a kernel g : R + — >> R, we can define a generalized graph 
dilation by pT| 



(V s g)(X) :=g(s\). 



(27) 



Note that, unlike the generalized modulation ( [24] ), the gener- 
alized dilation ( [27] ) requires the kernel g(-) to be defined on 
the entire real line, not just on cr(L) or [0, A max ]. 

Example 3 (Diffusion operators and dilation): The heat dif- 
fusion operator R = is an example of a discrete 
diffusion operator (see, e.g., [43] and fil4\ Section 2.5.5] for 
general discussions of discrete diffusions and $24\ Section 4.1] 
for a formal definition and examples of symmetric diffusion 
semigroups). Intuitively, applying different powers r of the 
heat diffusion operator to a signal f describes the flow of 
heat over the graph when the rates of flow are proportional 
to the edge weights encoded in L. The signal f represents the 
initial amount of heat at each vertex, and R r f = (e~ T ^ f 
represents the amount of heat at each vertex after time r. The 
time variable r also provides a notion of scale. When r is 
small, the entries (e~ T ) for two vertices that are far apart 

in the graph are very small, and therefore (( e_r ^) f) (0 
depends primarily on the values f(j) for vertices j close to i 
in the graph. As r increases, (( e_ ) f) W a ^ so depends on 
the values f(j) for vertices j farther away from i in the graph. 
Zhang and Hancock 477]/ provide more detailed mathematical 
justification behind this migration from domination of the local 
geometric structures to domination of the global structure of 
the graph as r increases, as well as a nice illustration of heat 
diffusion on a graph in 477] Figure 1 ]. 

Using our notations from ( [T4| ) and ( [27] ), we can see that 
applying a power r of the heat diffusion operator to any signal 
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f G w equivalent to filtering the signal with a dilated heat 
kernel: 

Wi = (e~ TL ) f = &g)(L)f = f * (V T g) , 

where the filter is the heat kernel g(Xi) = e~ Xl , similar to the 
one shown in Figure ^b). 

In Figure^ we consider the cerebral cortex graph described 
in pTj , initialize a unit of energy at the vertex 100 by taking 
f = ^ioo> allow it to diffuse through the network for different 
dyadic amounts of time, and measure the amount of energy 
that accumulates at each vertex. Note that dyadic powers of 
diffusion operators of the form {R 2 ~ 1 }k=i,2,... are of central 
importance to diffusion wavelets and diffusion wavelet packets 
fi24\l , p4\l , p5]/ , which we discuss in Section |7v| 




(a) 



-7-L c 

e Oioo 



(b) 



(c) 





(d) 



(e) 



(f) 



Fig. 6. Applying different powers of the heat diffusion operator can 
be interpreted as graph filtering with a dilated kernel. The original signal 
f = <5ioo on ^e cerebral cortex graph is shown in (a); the filtered signals 

{f * (!V-lS)} fc=1 , 2 ,3,4 = { R2fc " lf } fc=1 , 2 , 3 ,4 ^ Sh ° Wn in (bHe); 

and the different dilated kernels corresponding to the dyadic powers of the 
diffusion operator are shown in (f). 



E. Graph Coarsening, Downsampling, and Reduction 

Many multiscale transforms for signals on graphs re- 
quire successively coarser versions of the original graph 
that preserve properties of the original graph such as the 
intrinsic geometric structure (e.g., some notion of dis- 
tance between vertices), connectivity, graph spectral distri- 
bution, and sparsity. The process of transforming a given 
(fine scale) graph Q = {V,£,W} into a coarser graph 

Qr educed _ | ^reduced gr educed ^y^r reduced^ with fewer Ver- 
tices and edges, while also preserving the aforementioned 
properties, is often referred to as graph coarsening or coarse- 
graining [46 1 . 

This process can be split into two separate but closely 
related subtasks: 1) identifying a reduced set of vertices 
y reduced^ an( j 2) assigning edges and weights, £ reduced and 

reduced t0 connec t the new set of vertices. When an 



additional constraint that y reduced c V is imposed, the first 
subtask is often referred to as graph downsampling. The 
second subtask is often referred to as graph reduction or graph 
contraction. 

In the special case of a bipartite graph, two subsets can be 
chosen so that every edge connects vertices in two different 
subsets. Thus, for bipartite graphs, there is a natural way to 
downsample by a factor of two, as there exists a notion of 
"every other vertex." 

For non-bipartite graphs, the situation is far more complex, 
and a wide range of interesting techniques for the graph 
coarsening problem have been proposed by graph theorists, 
and, in particular, by the numerical linear algebra community. 
To mention just a few, Lafon and Lee (46) downsample based 
on diffusion distances and form new edge weights based on 
random walk transition probabilities; the greedy seed selection 
algorithm of Ron et al. [47 ] leverages an algebraic distance 
measure to downsample the vertices; recursive spectral bisec- 
tion (48) repeatedly divides the graph into parts according to 
the polarity (signs) of the Fiedler vectors ui of successive 
subgraphs; Narang and Ortega [49 1 minimize the number 
of edges connecting two vertices in the same downsampled 
subset; and another generally-applicable method which yields 
the natural downsampling on bipartite graphs ( (36| Chapter 
3.6]) is to partition V into two subsets according to the polarity 
of the components of the graph Laplacian eigenvector u/v-i 
associated with the largest eigenvalue A max . We refer readers 
to (47), (50) and references therein for more thorough reviews 
of the graph coarsening literature. 

There are also many interesting connections between graph 
coarsening, graph coloring (51), spectral clustering (22), and 
nodal domain theory (36] Chapter 3]. Finally, in a closely 
related topic, Pesenson (e.g., [52]) has extended the concept of 
bandlimited sampling to signals defined on graphs by showing 
certain classes of signals can be downsampled on particular 
subgraphs and then stably reconstructed from the reduced set 
of samples. 

IV. Localized, Multiscale Transforms for 
Signals on Graphs 

The increasing prevalence of signals on graphs has triggered 
a recent influx of localized transform methods specifically 
designed to analyze data on graphs. These include wavelets 
on unweighted graphs for analyzing computer network traffic 
(53), diffusion wavelets and diffusion wavelet packets (24), 
|44| , (45), the "top-down" wavelet construction of (54), graph 
dependent basis functions for sensor network graphs (55), 
lifting based wavelets on graphs (49), [56], multiscale wavelets 
on balanced trees (57), spectral graph wavelets |4T), cr itically - 
sampled two-channel wavelet filter banks |37|, |58[, and a 
windowed graph Fourier transform (42). 

Most of these designs are generalizations of the classical 
wavelet filter banks used to analyze signals on Euclidean 
domains. The feature that makes the classical wavelet trans- 
forms so useful is their ability to simultaneously localize 
signal information in both time (or space) and frequency, and 
thus exploit the time-frequency resolution trade-off better than 
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the Fourier transform. In a similar vein, the desired property 
of wavelet transforms on graphs is to localize graph signal 
contents in both the vertex and graph spectral domains. In the 
classical setting, locality is measured in terms of the "spread" 
of the signal in time and frequency, and uncertainty principles 
(see (59] Sec. 2.6.2]) describe the trade-off between time and 
frequency resolution. Whether such a trade-off exists for graph 
signals remains an open question. However, some recent works 
have begun to define different ways to measure the "spread" 
of graph signals in both domains. For example, (60) defines 
the spatial spread of any signal f around a center vertex i on 
a graph Q as 



An 



(f) == 



[dg(ij)] 2 [m? 



(28) 



Here, {[/(j)] 2 /||f || 2 }j=i,2,...,iv can be interpreted as a prob- 
ability mass function (pmf) of signal f, and A^(f) is the 
variance of the geodesic distance function dg(i, .) : V — )> R at 
node i, in terms of this spatial pmf. 

Similarly, the spectral spread of a graph signal can be 
defined as: 



A 2 (f) 



1 



mm 



[Va-Vm] 2 [/(a) 



A£cr(£) 



(29) 

where {[/(A)] 2 /| |f | ||} A=Ao , Al ,..., Amax is the pmf of f across 
the spectrum of the Laplacian matrix, and ^fjl and A^(f) are 
the mean and variance of respectively, in the distribution 
given by this spectral pmf|^] If we do not minimize over all 
/i but rather fix \i = and also use the normalized Laplacian 
matrix C instead of L, the definition of spectral spread in (29 ) 
reduces to the one proposed in (60). 

Depending on the application under consideration, other 
desirable features of a graph wavelet transform may include 
perfect reconstruction, critical sampling, orthogonal expan- 
sion, and a multi-resolution decomposition (37) . 

In the remainder of this section, we categorize the existing 
graph transform designs and provide simple examples. The 
graph wavelet transform designs can broadly be divided into 
two types: vertex domain designs and graph spectral domain 
designs. 



A. Vertex Domain Designs 

The vertex domain designs of graph wavelet transforms are 
based on the spatial features of the graph, such as node connec- 
tivity and distances between vertices. Most of these localized 
transforms can be viewed as particular instances of filtering 
in the vertex domain, as in ( [T8] ), where the output at each 
node can be computed from the samples within some K-hop 
neighborhood around the node. The graph spectral properties 
of these transforms are not explicitly designed. Examples of 

7 Note that the definitions of spread presented here are heuristically defined 
and do not have a well-understood theoretical background. If the graph is not 
regular, the choice of which Laplacian matrix (£ or C) to use for computing 
spectral spreads also affects the results. The purpose of these definitions and 
the subsequent examples is to show that a trade-off exists between spatial and 
spectral localization in graph wavelets. 



vertex domain designs include random transforms [55], graph 
wavelets (53), lifting based wavelets [49], |6T) , (62|7and the 
tree wavelets (57). 

The random transforms (55) for unweighted graphs compute 
either a weighted average or a weighted difference at each 
node in the graph with respect to a fc-hop neighborhood around 
it. Thus, the filter at each node has a constant, non-zero weight 
c within the k-hop neighborhood and zero weight outside, 
where the parameter c is chosen so as to guarantee invertibility 
of the transform. 

The graph wavelets of Crovella and Kolaczyk (53) are 
functions i/jj^i ' V — » R, localized with respect to a range 
of scale/location indices (k,i), which at a minimum satisfy 
Sjgv^mC?) = (i- e - a zero DC response). This graph 
wavelet transform is described in more detail in Section IIV-CI 

Lifting based transforms for graphs (49), (61), (62) are 
extensions of the lifting wavelets originally proposed for ID 
signals by Sweldens (63) . In this approach, the vertex set is 
first partitioned into sets of even and odd nodes, V = Vq U Vg. 
Each odd node computes its prediction coefficient using its 
own data and data from its even neighbors. Then each even 
node computes its update coefficients using its own data and 
the prediction coefficients of its neighboring odd nodes. 

In (57) , Gavish et al. construct tree wavelets by building a 
balanced hierarchical tree from the data defined on graphs, and 
then generating orthonormal bases for the partitions defined at 
each level of the tree using a modified version of the standard 
one-dimensional wavelet filtering and decimation scheme. 

B. Graph Spectral Domain Designs 

The graph spectral domain designs of graph wavelets are 
based on the spectral features of the graph, e.g., in terms of the 
eigenvalues and eigenvectors of one of the graph matrices de- 
fined in Section [TTJ Notable examples in this category include 
diffusion wavelets (24), [44], spectral graph wavelets |4T) and 
graph-QMF filterbanks (37) . The general idea of the graph 
spectral designs is to construct bases that are localized in both 
the vertex and graph spectral domains. 

The diffusion wavelets (24), (44), for example, are based 
on compressed representations of powers of a diffusion oper- 
ator, such as the one discussed in Example [3] The localized 
basis functions at each resolution level are downsampled and 
then orthogonalized through a variation of the Gram- Schmidt 
orthogonalization scheme. 

The spectral graph wavelets of [41] are dilated, translated 
versions of a bandpass kernel designed in the graph spectral 
domain of the non-normalized graph Laplacian L. They are 
discussed further in Section IIV-CI 

Another graph spectral design is the two-channel 
graphQMF filter bank proposed for bipartite graphs in 
(37) . The resulting transform is orthogonal and critically- 
sampled, and also yields perfect reconstruction. In this 
design, the analysis and synthesis filters at each scale are 
designed using a single prototype transfer function h(X), 
which satisfies: 



h 2 (X) + h 2 (2 - A) = 2, 



(30) 
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where A is an eigenvalue in the normalized Laplacian spec- 
trum. The design extends to any arbitrary graph via a bipartite 
subgraph decomposition. 

C. Examples of Graph Wavelet Designs 

In order to build more intuition about graph wavelets, we 
present some examples using one vertex domain design and 
one graph spectral domain design. 

For the vertex domain design, we use the graph wavelet 
transform (CKWT) of Crovella and Kolaczyk [53) as an 
example. These wavelets are based on the geodesic or shortest- 
path distance dg(i,j). Define dN(i,r) to be the set of all 
vertices j G V such that dg(i,j) = r. Then the wavelet 
function ip^f WT : V —> R at scale k and center vertex i G V 
can be written as 



W(i,T)\ 



, VjedAT(i,T), 



(31) 



for some constants {ak lT } T =o,i,...,k- Thus, each wavelet is 
constant across all vertices j G dAf(i,r) that are the same 
distance from the center vertex i, and the value of the wavelet 
at the vertices in dAf(i,r) depends on the distance r. If 
r > k, dk, T — 0, so that for any k, the function ^f WT is 
exactly supported on a /c-hop localized neighborhood around 
the center vertex i. The constants r in (f3Tb also satisfy 
z^ r =o a k,r = 0, and can be computed from any continuous 
wavelet function ^t 0,1 ^-) supported on the interval [0,1) by 
taking ak, T to be the average of ^ 0,1 \-) on the sub-intervals 
h,T = [fe+i j i+rl- m our exam pl es m Figures [7] and [8j we 
take t/^ ' 1 ^-) to be the continuous Mexican hat wavelet. We 
denote the entire graph wavelet transform at a given scale k 
as *™ := l^ WT ,^ WT ,...^ k K N WT ]. 

For the graph spectral domain design, we use the spectral 
graph wavelet transform (SGWT) of |4T| as an example. 
The SGWT consists of one scaling function centered at each 
vertex, and K wavelets centered at each vertex, at scales 
{t±,t2, . . . ,tic} £ R+- The scaling functions are translated 
low-pass kernels: 



.SGWT 
seal A 



Tih = h(L)5i, 



where the generalized translation T { is defined in ( [2T] ), and the 
kernel h(X) is a low-pass filter. The wavelet at scale t k and 
center vertex i is defined as 

where the generalized dilation V tk is defined in ( [27] ), and g(X) 
is a band-pass kernel satisfying g(0) — 0, liniA^oo d(X) = 0, 
and an admissibility condition |4T| . We denote the SGWT 
transform at scale ^ as 



SGWT 



so that the entire transform ^ SGWT : R N -> rN(k+i) is 
given by 



i.SGWTi 



SGWT 



SGWT. q,SGWT 



seal 



SGWT] 
t K \ 



We now compute the spatial and spectral spreads of the 
two graph wavelet transforms presented above. Unlike in 



the classical setting, the basis functions in a graph wavelet 
transform are not space-invariant; i.e., the spreads of two 
wavelets ^ kil and tj) k i2 at the same scale are not necessarily 
the same. Therefore, the spatial spread of a graph transform 
cannot be measured by computing the spreads of only one 
wavelet. In our analysis, we compute the spatial spread of 
a transform at a given scale k to be the average of 



the spatial spreads (28) over all scale k wavelet (or scaling) 
functions. Similarly, the spectral spread of the graph transform 
also changes with location. Therefore, we first compute 



l^(A)| 2 



lf^ M (A)| 2 , (32) 



and then take |/(A)| 2 = |^fc(A)| 2 in ([29]) to compute the 
average spectral spread of ^/ k - 

The spatial and spectral spreads of both the CKWT and 
SGWT at different scales are shown in Figure [7] The graphs 
used in this example are random d-regular graphs. Observe 
that in Figure [7] the CKWT wavelets are located to the right 
of the SGWT wavelets on the horizontal (spectral) axis, and 
below them on the vertical (spatial) axis, which implies that, in 
this example, the CKWT wavelets are less localized spectrally 
and more localized spatially than the SGWT wavelets. This 
analysis provides an empirical understanding of the trade-off 
between the spatial and spectral resolutions of signals defined 
on graphs. 

Spatial and spectral localization on random-regular graphs 



- -0.5 
■a 

£ 

S- -1 



Crovella wavelets (Mexican-hat) 
SGWT wavelets 



0.15 
Spectral-spread 



Fig. 7. The average spatial and spectral spreads of two example wavelet 
transforms on 5 instances of ^-regular random graphs (size TV = 300, degree 
d = 5). The coordinates of each point in this figure are the average spatial 
and spectral spreads across all wavelets at a given scale. 



Next, to empirically demonstrate the ability of these graph 
wavelet transforms to efficiently represent piecewise smooth 
signals on graphs, we compute the graph wavelet coefficients 
of the piecewise smooth signal with a sharp discontinuity 
shown in Figure [SJa) on the unweighted Minnesota road graph, 
where the color of a node represents the value of the signal at 
that vertex. We use the CKWT with scales k = 1, 2, . . . , 10, 
and the SGWT with 5 wavelet scales, as well as a scaling 
kernel. The bandpass wavelet kernel, scaling kernel, and values 
of the scales £1,^2,^3, an d £4 are all designed by the SGWT 
toolbox |4T| . The CKWT wavelet coefficients as scales 2 
and 4 are shown in Figures [8jb) and [8jc), and the SGWT 
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scaling coefficients and wavelet coefficients at scales £2 and 
£4 are shown in Figures [8jd)-(f), respectively. Observe that 
for both transforms, the high-magnitude output coefficients are 
concentrated mostly near the discontinuity. This implies that 
these graph wavelet transforms are able to localize the high- 
pass information of the signal in the spatial domain, which 
the graph Fourier transform or other global transforms cannot 
do. 



0.1 5 1 1 

•••m 0.05 1 1 



important. Yet, relatively little is known about how the 
construction of the graph affects properties of the local- 
ized, multiscale transforms for signals on graphs. 




fc) 



-0.05 

-0.1 y 

-0.15 1 



(f) 



Fig. 8. (a) A piecewise smooth signal f with a severe discontinuity on the 
unweighted Minnesota graph, (b)-(c) Wavelet coefficients of two scales of the 
CKWT. (d) Scaling coefficients of the SGWT. (e)-(f) Wavelet coefficients 
of two scales of the SGWT. In both cases, the high-magnitude wavelet 
coefficients cluster around the discontinuity. 



V. Summary, Open Issues, and Extensions 

We presented a generic framework for processing data on 
graphs, and we surveyed recent developments in the area of 
graph signal processing. In particular, we reviewed ways to 
generalize elementary operators such as filtering, convolution, 
and translation to graph setting. Such operations represent the 
core of graph signal processing algorithms, and they underly 
the localized, multiscale transforms we discussed in Section 



IV For many of the generalized operators defined in Section 

III and the localized, multiscale transforms reviewed in Section 

IV classical signal processing intuition from Euclidean spaces 
can be fairly directly extended to the graph setting. For 
example, we saw in Section |II-C how the notion of frequency 
extends nicely to the graph setting. However, signals and 
transforms on graphs can also have surprising properties due 
to the irregularity of the data domain. Moreover, these are 
by no means the only conceivable ways to generalize these 
operators and transforms to the graph setting. Thus, quite a few 
challenges remain ahead. In this section, we briefly mention 
a few important open issues and possible extensions. 

A. Open Issues 

• Because all of the signal processing methods described 
in this paper incorporate the graph structure in some 
way, construction of the underlying graph is extremely 



As mentioned in Section II-F it is not always clear when 
or why we should use the normalized graph Laplacian 
eigenvectors, the non-normalized graph Laplacian eigen- 
vectors, or some other basis as the graph filtering basis. 
Similarly, in the vertex domain, a number of different 
distances, including the geodesic/shortest-path distance, 
the resistance distance (64), the diffusion distance (46), 
and algebraic distances (47), have useful properties, but it 
is not always clear which is the best to use in constructing 
or analyzing transform methods. 

Transform operators are only useful in high-dimensional 
data analysis if the computational complexity of applying 
the operator and its adjoint scales gracefully with the 
size of the signal. This fact is confirmed, for exam- 
ple, by the prevalence of fast Fourier transforms and 
other efficient computational algorithms throughout the 
signal processing literature. Most of the transforms for 
signals on graphs involve computations requiring the 
eigenvectors of the graph Laplacian or the normalized 
graph Laplacian. However, it is not practical to explicitly 
compute these eigenvectors for extremely large graphs, 
as the computational complexity of doing so does not 
scale gracefully with the size of the graph. Thus, an 
important area of research is approximate computation 
techniques for signal processing on graphs. Efficient 
numerical implementations for certain classes of graph 
operators have been suggested using polynomial approx- 
imations (4), (40), |4T) and Krylov methods (TT), but 
plenty of numerical issues remain open, including, e.g., 
a fast graph Fourier transform implementation. 
In Euclidean data domains, there is a deep mathematical 
theory of approximation linking properties of classes of 
signals to properties of their wavelet transform coeffi- 
cients (see, e.g., (65)). A major open issue in the field 
of signal processing on graphs is how to link structural 
properties of graph signals and their underlying graphs 
to properties (such as sparsity and localization) of the 
generalized operators and transform coefficients. Such a 
theory could inform transform designs, and help identify 
which transforms may be better suited to which appli- 
cations. One issue at the heart of the matter is the need 
to better understand localization of signals in both the 
vertex and graph spectral domains. As discussed briefly in 
Section |lVj even defining appropriate notions of spreads 
in these domains is highly non-trivial. Moreover, unlike 
in the classical Euclidean settings, the graph Laplacian 
eigenvectors are often highly non-localized, making it 
more difficult to precisely identify the trade-off between 
resolution in the vertex domain and resolution in the 
graph spectral domain. Agaskar and Lu (60) have begun 
to define such localization notions and study the resolu- 
tion trade-off. 
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B. Extensions 

The signal processing techniques we have described are 
focused on extracting information from a static signal on a 
static, weighted, undirected graph. Some clear extensions of 
this framework include: 1) considering directed graphs, as is 
done for example in (66); 2) considering time series of data 
on each vertex in a graph; 3) considering a time- varying series 
of underlying graphs, as is done for example in (67); or any 
combination of these. 

Finally, while the number of new analytic techniques for 
signals on graphs has been steadily increasing over the past 
decade, the application of these techniques to real science 
and engineering problems is still in its infancy. We believe 
the number of potential applications is vast, and hope to 
witness increased utilization of these important theoretical 
developments over the coming decade. 
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